Skills: Visualizing model results
You can use your model results to generate predictions for hypothetical cases, and this can be a useful way to visualize your model results.
Initial setup
I’ll start by loading my data and estimating my preferred model
R
For the R examples on this page, I’ll start by loading the data and
the packages I need, and by estimating my preferred model and saving it
as an model object called my_model.
library(tidyverse)
library(here)
my_data <- here("datasets",
"test.csv") |>
read_csv() |>
mutate(building = factor(building,
levels = c("Single-family",
"Duplex",
"Three-plus")))
my_model <- lm(log2(`sales-price`) ~ building +
rooms +
sch_quality +
log2(city_dist) +
sch_quality*building,
data = my_data)
summary(my_model)##
## Call:
## lm(formula = log2(`sales-price`) ~ building + rooms + sch_quality +
## log2(city_dist) + sch_quality * building, data = my_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3998 -0.5162 0.0072 0.5125 2.7573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.319563 0.066941 243.791 < 2e-16 ***
## buildingDuplex 0.011077 0.068136 0.163 0.87086
## buildingThree-plus 0.066305 0.067652 0.980 0.32706
## rooms -0.001922 0.007547 -0.255 0.79897
## sch_quality 0.397331 0.015119 26.280 < 2e-16 ***
## log2(city_dist) -0.207483 0.010751 -19.299 < 2e-16 ***
## buildingDuplex:sch_quality -0.073603 0.024252 -3.035 0.00241 **
## buildingThree-plus:sch_quality -0.027152 0.024072 -1.128 0.25937
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.761 on 9992 degrees of freedom
## Multiple R-squared: 0.1577, Adjusted R-squared: 0.1571
## F-statistic: 267.2 on 7 and 9992 DF, p-value: < 2.2e-16
Noticed that I’ve releveled my categorical variable so that “single-family” comes first (so that it will be used as my reference category). My model includes a base-two log transformation for the outcome (home sale price) and one of the predictors (distance to the city). It also includes an interaction between school quality and building type.
Excel
Here are the same model results in Excel:
Generating a set of hypothetical values
To visualize the relationships in your model, it may be helpful to generate a data set where one continuous predictor varies across its full range, and the others are held at their averages.
Generating a sequence of values across a range
You might start by generating a sequence of numbers that covers the full range of of your variables.
R
In my data set, I know that the city distance variable ranges from
about a quarter mile to 20 miles, which I can confirm using the
summary() function.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2731 2.8701 3.9791 4.3751 5.4798 20.0000
So a set of values that covers that full range might start at 0.25
and increase in increments of 0.25 until it reaches a value of 20. I can
use the seq() function to generate a set of numbers like
this.
Now I have that set of 80 numbers in a variable called
vary_dist.
## [1] 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
## [13] 3.25 3.50 3.75 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75 6.00
## [25] 6.25 6.50 6.75 7.00 7.25 7.50 7.75 8.00 8.25 8.50 8.75 9.00
## [37] 9.25 9.50 9.75 10.00 10.25 10.50 10.75 11.00 11.25 11.50 11.75 12.00
## [49] 12.25 12.50 12.75 13.00 13.25 13.50 13.75 14.00 14.25 14.50 14.75 15.00
## [61] 15.25 15.50 15.75 16.00 16.25 16.50 16.75 17.00 17.25 17.50 17.75 18.00
## [73] 18.25 18.50 18.75 19.00 19.25 19.50 19.75 20.00
Excel
In Excel, I can just start a column of numbers that in a sequence, and drag the bottom left corner of the second cell to repeat that pattern until I reach the maximum value I want.
Creating a dataset
Now you can use that set of varying values as part of a data set where everything else isi held constant (perhaps varying the categorical variable as well).
R
In R, you can use the tibble() function to construct a
data frame that has your varying values as one column and a constant
value in each of the other columns.
vary_dist_single_fam <- tibble(city_dist = vary_dist,
rooms = mean(my_data$rooms),
sch_quality = mean(my_data$sch_quality),
building = "Single-family")You might want to do this for each of the categories in your categorical variable.
vary_dist_duplex <- tibble(city_dist = vary_dist,
rooms = mean(my_data$rooms),
sch_quality = mean(my_data$sch_quality),
building = "Duplex")
vary_dist_triplex <- tibble(city_dist = vary_dist,
rooms = mean(my_data$rooms),
sch_quality = mean(my_data$sch_quality),
building = "Three-plus")And then combine them all using the rbind()
function.
Excel
Doing something similar in Excel is fairly straightforward and just involves some copying and pasting.